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Abstract 

We show that the existence of a computationally efficient calibration algorithm, 
with a low weak calibration rate, would imply the existence of an efficient algo- 
rithm for computing approximate Nash equilibria — thus implying the unlikely 
conclusion that every problem in PPAD is solvable in polynomial time. 

1 Introduction 

Consider a weather forecaster that predicts the probability of rain. The forecaster is said 
to be calibrated if every time she predicts a certain probability of rain, the empirical 
average of rainy vs. non-rainy days approaches this forecasted probability. 

This very natural property of forecasting was introduced by fDaw82 ] and has found 
numerous applications since !IFV97l|FV^|KLS99l|Fos^|FL99llMSA07IIPer09llMS10l 
RSTlT). See ICL06I for a more detailed bibliographic survey. 

|FV98] provided the first randomized calibration algorithms. Subsequently, numer- 
ous other algorithms have been developed based on various different techniques have 
followed: Blackwell approachability [Fos99|, internal-regret minimization HFV98I and 
online convex optimization [ABH1 1 1, to name a few. 

While existence results for calibration are well established, our understanding of 
the statistical and computational complexity is more murky. The statistical complex- 
ity can be thought of as the number of rounds it takes achieve some natural notion 
of a low calibration; the computational complexity can be thought of as the net com- 
putation time to achieve this. This work provides a lower bound for the latter. When 
characterizing the efficiency of algorithms, the critical issue is the relationship between 
the relevant parameters and the desired notion of calibration. The notion of the (total) 
calibration rate (at precision e) is essentially that defined by 0FV98I . The relevant pa- 
rameters are the number of forecasting iterations (henceforth denoted T), the precision 
of calibration e, and number of possible outcomes in the forecasting game, d. A variant 
of this question was posed as an open problem in 1AJVI1 II . Q 

1 1 AMI 1] did not explicitly pose this question in terras of net computation time. 
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In this work, we give a negative result showing that calibration (in the worst case) 
is hard, under a widely-believed computational complexity assumption. In particular, 
we utilize a natural (smooth) notion of calibration at scale e, namely weak calibration 
(as in IIKF08II ). Precisely, the complexity implication of our main result, Theorem[3] is 
as follows: 

Corollary 1. Suppose there exists a constant c > and a weak calibration algorithm 
which, for every precision e > 0, attains a calibration rate of e c in a total compu- 
tational running time (in the RAM model) that is polynomial in both d and -, then 
PPAD C RP. 

Here, the weak calibration rate is a cumulative notion of error, precisely defined 
in in Section [2 RP stands for the complexity class of randomized polynomial time; 
PPAD is the class of problems that are polynomial time reducible to the problem of 
computing Nash equilibrium in a two player game (See |Pap94 |Das09ll ). It is widely 



believed that PPAD is not contained in RP. Note that we are considering the total 
computation time over all T rounds (so there is no explicit T dependence). 



2 Calibration 



Calibration inherently concerns distributions, and when comparing distributions it makes 
sense to talk about statistical distance or its closely related cousin the l\ norm, rather 
than the Euclidean norm. Therefore throughout we use || • || to denote the l\ norm and 
|| ■ ||p to denote the l v norm. 

We let {0, 1, 2, d} be an outcome space, and X\, X2, ■ ■ ■ Xt be a sequence of 
outcomes, denoted as X t £ {0, l} d , such that X t (i) is one if and only if the outcome 
in iteration t is i £ [d]. Hence ^ ^ t X t is the empirical frequency of outcomes. 

A randomized forecaster A produces a sequence of probability distributions T>\ , . . . , T>t 
over the set A^ = {p£ R d , pi > 0, J^i Pi — !}■ Every iteration a point in the interior 
of the simplex is chosen: p t ~ V t , which constitutes the forecast of A. 

Strong Calibration: For a set of points V C A^, define the following "test" functions 
(where the arg min breaks ties arbitrarily): 



1 p = argmin p , ev ||p' - q\\ 
otherwise 



We say this set of test function is at precision e if V is such that every q £ A^ is at least 
e-close (in l{) to some point in V, i.e. for all q £ A^, we have min p6 y \\p — q\\ < £ 
(i.e. the set V is an e-cover for A^). 

Definition 1. Let the strong-calibration rate of a (possibly randomized) forecaster A 
with respect to indicator test functions T e = at precision e, be 



Ct(X\:T 1 A, J~" 



E 

X>i, ...,£>? 



T ^ 



1=1 
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This definition is closely related to that used in fB L85llFV98l ; the latter definition 
is motivated by a bias-variance decomposition of the Brier score. The distinctions being 
that IFV98I use the squared £2 error (while we use the £\ primarily for convenience) 
and IIFV98I restrict A to make predictions which lie in V (a minor distinction). 

Much of the literature is concerned with the asymptotic behavior, without explicitly 
characterizing the finite time rate. It is standard to say that a forecaster A is (strongly) 
asymptotically calibrated if for all X\-t, we can drive Ct{A, T e ) to 0, as T — > 00. 
If A is restricted to make predictions in the set V, then this notion seeks to drive 
Ct(A, J- e ) < e in the limit. In this work, the rate of this function is critical. 

The definition of asymptotic calibration considers the "total error" over an e-grid, 
and it adjusts the normalization for each term to ^. Note that our indicator functions 
satisfy for all q £ A^: 

pev 

Since every q is covered by only one indicator function. This implies that: 

P ev t=i 

which implies that Ct{X\-t, A, J-" e ) is bounded by 2. 

Weak Calibration: We now turn to the notion of weak calibration, which covers 
in a more continuous manner. The weak calibration rate is more naturally defined by 
a triangulation of the simplex, A^. By this, we mean that A^ is partitioned into a set 
of simplices such that any two simplices intersect in either a common face, common 
vertex, or not at all. Let V be the vertex set of this triangulation. Note that any point q 
lies in some simplex in this triangulation, and, slightly abusing notation, let V(q) be the 
set of corners for this simplex. Note that the function V(-) specifies the triangulation. 

Instead of indicator functions I p (-), we associate a test function u> p (-) with each 
p £ V as follows. Each q £ A^ can be uniquely written as a weighted average of its 
neighboring vertices, V(q). Forp € V(q), let us define the test functions 0J p (q) to be 
these linear weights, so they are uniquely defined by the linear equation: 

q = w p(<?)p 

pev(q) 

For p ^ V(q), we let 0J p (q) = 0. We refer to this set of functions as the triangulated 
test functions with regards to V(-) and say that this is at precision e if the diameter of 
the set of points V(q) is less than e for all q. 
A useful property is that for all q £ Ad, 

^ = 1 ( 2 ) 

pev 

since q lies in the convex hull of V(q). In comparison to Equation (Q]i, these test 
functions cover A^ in a more smooth manner: they again sum to 1, and each ui p (q) is 
a continuous function (as opposed to the discontinuous indicator functions). 
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We now define deterministic calibration algorithms, so called "weak calibration" 
with regards to these Lipchitz test functions. 

Definition 2. Let W e = {oj p } be a set of triangulated test functions at precision e. 
The weak-calibration rate for a (deterministic) forecaster A with respect to to W e 



C r (I 1:T ,AW £ ) = ^ 



HKFQ8H showed that there exist deterministic calibration algorithms (also see HMSA07I ). 
Again, note the normalization property: 

1 T 

pev t=l 

which implies that Ct(Xi-t, A, W £ ) is bounded by 2. 



3 Main Result 

Our main result is based on using a calibration algorithm to compute a Nash equilib- 
rium of a two player game. Before we state our main result, let us review the definition 
of an approximate Nash equilibrium, along with the attendant computational complex- 
ity results. 

3.1 Nash equilibria in games 

A (square) two-player bi-matrix game is defined by two payoff matrices Ux,U 2 € 
R nxn , such that if the row and column players choose pure strategies i, j e [to], respec- 
tively, the payoff to the row and column players are U\(i, j) and 1/2(1, j), respectively. 

A mixed strategy for a player is a distribution over pure strategies (i.e. rows/columns), 
and for brevity we may refer to it simply as a strategy. An e-approximate Nash equi- 
librium is a pair of mixed strategies (p, q) such that 

Vi 6 [n], p T U 1 q>eJU 1 q~e, 
Vje[n], p T U 2 q > p T U 2 e 3 - e. 

Here and throughout, e; is the i-th standard basis vector, i.e. 1 in i-th coordinate, and 
in all other coordinates. If e = 0, the strategy pair is called a Nash equilibrium (NE). 

For notational convenience, we slightly abuse notation by denoting the payoffs of 
mixed strategies as: 

U 1 {p,q)=p Y U 1 q, U 2 (p,q)=p T U 2 q 

The definition immediately implies that the pair (x,y) is an e-equilibrium if and 
only if for all mixed strategies J, y, 

Ui(x,y) > Ui(x,y) - e, 
U 2 (x,y) > U 2 (x,y) - e. 
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Algorithm 1 Approximate NE computation via calibration algorithm A 

Input: calibration algorithm A along with W £ on the outcome space {0, l} d x 
{0, two player game U\, Ui over x Ad- 
Initialize Set 6 = e 1 / 3 and pi to be .4(0) 
for t = 1,2, ...,Tdo 

Let [pt]i and [pt\2 denote the marginal distributions of pt with respect to the first 

and second coordinates (respectively). 

Sample the outcome X t G {0, l} d x {0, l} d according to the product distribution: 

X t ~ BRi.«([pt] 2 ) x BR 2 , 5 ([p t ]i) 

where BR^ g is a smooth best-response function, defined in Section PTTl 
Update pt+i <- A(Xx,...,X t ) 
end for 

Sample t uniformly from {1, . . . T} 

Sample p € V(pt) under the law Pr(p\p t ) = ui p (p t ). 

return BRj(p) = (BR M ([p] a ),BRa,j([p]i)) 



As we are concerned with an additive notion of approximation, we assume that the 
entries of the matrices are in the range [0,1]. In particular this implies that the functions 
Ui,U-2 are 1-Lipschitz w.r.t the l\ norm, since for all pi,P2,q G &d'- 

U t (pi,q) - U l (p 2l q) = (pi -p2) T U l q < \\pi - P2\\\\U i q\\ 00 < \\pi - p 2 \\ (3) 

Where we used Holder's inequality and the fact that Ui(i,j) G [0, 1]. 
The following theorem was provided by | CDT09 1 : 

Theorem 2. [CDT09*$ If there exists a randomized algorithm that computes a e-NE in 
a two player game in time poly(d, |) then PPAD C RP. 

3.2 Nash equilibria computation with a calibration algorithm 

We now present the reduction from weak calibration to computing equilibria in 
games, thereby obtaining the hardness result stated in CorollaryQ] Algorithm[T]utilizes 
a calibration algorithm in a specially tailored game theoretic protocol. Observe this 
protocol is run with an outcome space of size d 2 . This protocol is based on the ideas in 
IKF08I . which utilized a weak calibration algorithm to obtain asymptotic convergence 
to the convex hull of Nash equilibria (also see IMSA071 "). Here, our algorithm outputs 
a particular approximate Nash equilibrium in finite time, which allows us to provide a 
computational complexity lower bound. 

Theorem 3. Suppose a weak calibration algorithm A satisfies the following uniform 
bound on the calibration rate: Ct(Xi : t, A,W E ) < F(d, W e ,T) (where F does not 
depend on Xi-t). Let d > 2 and e < i. Then with probability greater than 1/2, 
AlgorithmUl(using 6 = e 1 / 3 ) returns a (4F(d 2 ,W 6 , T) + 22de^ 3 )-Nash equilibrium. 

This directly implies Corollary Q] as follows: 
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Corollary\J] Let A be a weak calibration algorithm that attains a calibration rate of 
e c at precision e. Then for some T (where T is polynomial in e, d) we have that 
Ct(X 1:T ,A,W £ ) < F(d 2 ,W e ,T) < s c . TheoremEJimplies that Algorithm[T]returns 
a 0(e c + de 1 / 3 )-NE after T iterations with probability greater than -i. This consti- 
tutes a randomized polynomial time algorithm for e-NE, which by Theorem [2] implies 
PPAD C RP. □ 



4 Analysis 

Our analysis is arranged into three parts. First, we define a smooth best response func- 
tion BRj along with some technical lemmas. Then we show how fixed points of this 
BR5 function are approximate Nash equilbria. With these lemmas, we complete the 
proof. 

4.1 Smooth Best Response Functions 

Our algorithm utilizes smooth best response functions. For a mixed strategy q € Ad, 
define the best response functions as: 

BRi(g) = argmax peAd {t/ i (p,q)} 

In case the RHS is a set, define BR; as an arbitrary member of the set. 

We say that a function g : Ad i-t Ad is an e-best response with respect to Ui if the 
following holds: 

V<?, U i (g(q),q)>U i (Bn i (q),q)-£ 

It is be convenient to extend the best response function beyond the simplex. Define 
for any point in Euclidean space: 

VpGM n .BR i (p) = BR i (JJ(p)) 

where Il/cCf) denotes the projection operation onto a convex set K, defined as: 

T(p) =argmin||p-g||2 

K 

Using the generalized definition of BR^, define the 5-smooth best response function 
as: 

BTL i)S (q):= E [BR»(?')] (4) 

\\q' -q\\<x><8 

where the expectation is with respect to the random q' sampled uniformly on the set 

W\ W-q\\oo<5}. 

Lemma 4. The function BR; j is a (2dS)-best response with respect to Ui. 
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Proof. Let q, q' be such that ||g — q'\\oo < S. Hence, \\q' — q\\ < d5 and since C/j is 
1-Lipschitz with respect to the l\ norm (see equation (0): 

Vp.|Ui( P> g , )-Ui(p,g)| < ||g'-g]| < dS 
Let g' = argmin- 6Ad |i-_ g M <(5 C/j(BRj(g), g). Using the definitions above, we have 

f7i(BRf {(g), g) = XjA E [BR ; (g)], g 

Vll9'-9lloo<<5 

> C/ i (BR i (g'),g) 

> C/ i (BR i (g'),g')-^ since j|g' -g|U < 5 

> U t (BR; (g) ,q')-d5 definition of BR, 

> C/i(BRi(g),g) - 2dc5 since ||g' - g)^ < (5 

which completes the proof. □ 
Lemma 5. For 2 < d < i, the function BR; j is j^-Lipschitz- 
Proof. Consider any two distributions p, q. We consider two cases: 

case 1: ||p — g||oo > S 2 . In this case we have 

\\BR hS (p) - BR M (g)|| < ||BR j; , 5 (p)|| + ||BR M (g)|| triangle inequahty 

< 2 the range of BRj ^ is 

2 

< lb - g|U • by condition on \\p - gj|oo 

2 

<\\P~9\\-p 

case 2: \\p — g||oo < <5 2 ■ Denote the d-dimensional cube with radius S centered at p 
by 

C d s {p) = C s {p) = {q e A d , ||g-p|U<5} 



We have 



|BR i , ff (p)-BR iltf (g)|| = || E [BRj(p')] — E [BRi(g') 

l|p'-p||=o<<5 II?'-<?I|do<<5 

= || E [BR(p')]- E [BR t (g')]|| 

P'ec s (p) q'eCs(q) 

voHC s ( P )\C s (q) U C s (q)\e s (p)) 



< 2 



vol(C 5 (p)UC*(g)) 
vo\{C s (p)\Cs(q)) 
vol(C 5 (g)) 



The volume of Cs{x) for any x G M. d is given by 5 d . To bound the volume of Cs(p) \ 
C$(q) notice that at least one coordinate of any point in this set is within distance 5 of 
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p but not of q. Hence, the range of possible values for this coordinate is bounded by 
\\p — q\\oo- This is possible for all d coordinates, and we obtain: 

vol{C d -(p) \ C s (q)) <\\p-q\\oo-d- volCCf" 1 ^)) < d\\p - qWooS 4 - 1 

We conclude that: 

vol{C 5 (p)\C s (q)) 



|BR i , 4 (p)-BR i , 4 (g)|| <2- 



vol(C 4 (g)) 



2\\p-q\\ 00 dS d - 1 2d 2 
< ^ < y \\p-4oc < ^allp-fflU 

which completes the proof. □ 

4.2 Approximate Nash equilibria and fixed points 

Lemma 6. (Approximate NE are Approximate Fixed Points) Let p be a (possibly joint) 
distribution on the space of outcomes {0, l} d x {0, l} d ; let [p]i and [p}2 denote the 
marginal distributions of p with respect to the first and second coordinates (respec- 
tively); let BRj(p) denote the product distribution BRi.j([p]2) x BR2.j([p]i). Sup- 
pose 

||p-BR*(p)|| <7 
Then BRj(p) is a (27 + 2dS)-NE. 

Proof. By construction, BR5 (p) is a product distribution. Hence, it suffices to show 
that BRi .a([p]2) is an (27 + 2d<5)-best response to BR2/([p]i) (and vice versa). First, 
observe that: 

d d d 

WW-WW = £ll£(<Ki,j)-P(i,i))ll < E = ||?-pI| (5) 

i=l j=l i,j=l 

Similarly, \\[q] 2 - WW < lk _ Pll Hence, 

||[p]i-BR ilff (p)|| < \\p-BH 5 (p)\\ <7 

By LemmaH] BRi 5([p]2) is a 2c?(5-best response to [p] 2 . Since | [p] 2 — BR 2 .a([p]i)|| < 
7, we have that for all q g Ad, 

|C7 1 (g,[p] 2 )-C/ 1 ( 9 ,BR 2>5 ([p] 1 ))| < 7 

Hence, for all q 6 A^, 

[/ 1 (BR 1)5 ([p] 2 ),BR 2 , 5 ([p] 1 )) > f/ 1 (BR 1 , 5 ([p] 2 ),[p] 2 )-7 

>U 1 (q,\p] 2 )- 1 -2d6 
>U 1 (q,BR 2t5 (\p] 1 ))-2 1 -2d8 

which proves the claim. □ 



5 Proof (of Theorem 0)) 

Three observations are helpful for intuition in the proof: 

• By construction in AlgorithmQ] in expectation, the outcomes X t arejust BRj(p t ). 
Precisely, B[Jf t |Xi, . . . X t -i] = BR s ( Pt ). 

• Suppose uipipt) is nonzero (so \\p — pt \\ < e ). Then, by Lemma|5] the larger S 
is the closer BRj(j) f ) and HRsip) will be to each other. 

• The smaller 6 is, the more accurate an approximate NE we have for an approxi- 
mate fixed point of BR5 (by Lemma|6]l. 

The proof of Theorem[3]is a consequence from the following lemma. 

Lemma 7. Let p and X\-t be the random variables defined in Algorithm\l\ For 2 < 
d < j, we have that: 

E \\p - BR s (p)\\ < E[C T (X 1:T , A, W £ )] + e + ^ 

The proof of our Main result now follows: 
Theorem\3\ By Markov's inequality, we have that with probability greater than 1/2 

\\p - BRs(p)|| < 2 E[C T {X 1:T , A, W £ )}+2e+ — 
< 2F(d 2 ,W £ ,T) + 10e 1/3 

using the definition of F (on a d 2 sized outcome space) and 5 = e 1 ^ 3 . By applying 
Lemma|U we have a (4F(d 2 , W S ,T) + 20e 1/3 + 2de 1/3 )-NE, which completes the 
proof. □ 

We continue to prove Lemma [7] 
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Lemma\7\ We proceed by lower bounding the expected calibration rate as follows: 

E[C T {X 1:T ,A, W e )] 



E 



E 

peV 



rp 

pev 
pev 
pev 
pev 



E 



1 

-^cj p (p f )(pi — -x:*; 

T 

*=i 

t=i 

T 

^E[ E[wp(p t )(p t -X t )|X 1) ...X t _ 1 ] ] 
t=i 

T 

^E[w p (p t )(Pt-BR 5 (pt))] 



Jensen's 



linearity 



pt is determined by the history 



Note that by construction in Algorithm[T]i?[X t |Xi, . . . X t -{\ = BRi(p ( ), which we 
have used in the last step. 
Hence, we have: 



:[Ct(4t,Aw e )]>^ 

pev 

rp 



5^E[w P (pt)(p-BR5(p)) 

T 

J2E[uJ p (pt)(p-pt + BRs(p t )-BR s (p))} 



pev 



t=i 



by the triangle inequality. 
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For the first term, 



T 



1 T 

p£V t=l 

f E ( E E Mft)i ) (p - BR ^(p)) 

pev \t=i / 
1 T 

? X)X>k,(Pt)]iip-BRi(p)ii 



T 



J2^ P (Pt)\\p-'BRs(p)\\ 

pev 



p~D 



lb-BRi(p)|| 



where p ~ D is sampled as follows: first, sample t uniformly from [T], then sample 
Pt according to the underlying process, and then sample p G V(pt) with probability 
u>p(pt). Note that D is precisely the sampling procedure defined in AlgorithmQ] 
For the last term, we have that: 



1 T 

r S Y.^^p (p* ) (p - + BR * (p* ) - BR * (p))] 

P ey t=i 
1 T 

< f EEH E K(Pf)(P - ft + BR *(ft) - BR* (p))] || 

pGV t=l 

1 T 

<^EE E [iK(ft)(p - ft + BR ^(ft) - BR *(p))ii] 

p^V i=l 
1 T 

<^EE E Mft) Hp - ft H + w p(ft) ll BR ^(ft) - BR *b)ll] 



triangle inequality 



Jensen's 



sublinearity 



pGV i=l 
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Now observe that for product distributions D = p(x)q(y) and D' = p'(x)q'(y). 
\\D-D'\\=Y,W)q{y)-p'{x) q '{y)\ 



x.y 



< \p( x )i(y) -p( x )i'(y)\ +^2\p( x W(y) -p'(x)q'(y)\ 

x,y x,y 

= ^2p(x)\q(y) -i'(y)\ + ^2<i'(y)\p(x) -p'(^)\ 

x,y x,y 
= \\ q - q '\\ + \\p-pi\\ 



Also note that V(q) has diameter e, then if w p (q) ^ then \\p — q\\ < e. Hence, 

||BRi(p t )-BB < (p)|| 
< \\BH 1>5 (\p t } 2 ) - BR M ([p] 2 )|| + ||BRa,«([pt]i) - BRa.jflplOU 
2HW2-H2II , 2||[p t ]i-[p]i|| 



< 
< 
< 



S 2 

M\pt-P\\ 



s 2 



s 2 



by Lemma[5] 
by Equation[5] 



4c 
S 2 



where we have used Lemma|5]with our condition on d. 
Hence, for the last term, 



rp 



E K (Pt ){P-Pt+ BR{ { Pt ) - BR 5 (p))] 



t=i 

T 



p£V t=l 
T 



1 \ ' 

7» 



4e 



4e 



4e 



The claim now follows. 



□ 



6 Discussion and Open Problems 

This work provides a computational lower bound for weak calibration, suggesting that 
the hardness of the problem may be fundamentally related to the problem of finding a 
fixed point. The following questions remain open: 
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• Is it possible to obtain an efficient algorithm for strong calibration? (One which 
gives a low calibration error in time polynomial in the relevant parameters.) 

• What is the statistical complexity of (weak or strong) calibration? Here, the sta- 
tistical complexity is the number of rounds required to calibrate at some desired 
level of accuracy, without computational considerations. 
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